Association Rule Mining: A Survey

نویسندگان

  • Qiankun Zhao
  • Sourav S. Bhowmick
  • S. S. Bhowmick
چکیده

Data mining [Chen et al. 1996] is the process of extracting interesting (non-trivial, implicit, previously unknown and potentially useful) information or patterns from large information repositories such as: relational database, data warehouses, XML repository, etc. Also data mining is known as one of the core processes of Knowledge Discovery in Database (KDD). Many people take data mining as a synonym for another popular term, Knowledge Discovery in Database (KDD). Alternatively other people treat Data Mining as the core process of KDD. The KDD processes are shown in Figure 1 [Han and Kamber 2000]. Usually there are three processes. One is called preprocessing, which is executed before data mining techniques are applied to the right data. The preprocessing includes data cleaning, integration, selection and transformation. The main process of KDD is the data mining process, in this process different algorithms are applied to produce hidden knowledge. After that comes another process called postprocessing, which evaluates the mining result according to users’ requirements and domain knowledge. Regarding the evaluation results, the knowledge can be presented if the result is satisfactory, otherwise we have to run some or all of those processes again until we get the satisfactory result. The actually processes work as follows. First we need to clean and integrate the databases. Since the data source may come from different databases, which may have some inconsistences and duplications, we must clean the data source by removing those noises or make some compromises. Suppose we have two different databases, different words are used to refer the same thing in their schema. When we try to integrate the two sources we can only choose one of them, if we know that they denote the same thing. And also real world data tend to be incomplete and noisy due to the manual input mistakes. The integrated data sources can be stored in a database, data warehouse or other repositories. As not all the data in the database are related to our mining task, the second process is to select task related data from the integrated resources and transform them into a format that is ready to be mined. Suppose we want to find which items are often purchased together in a supermarket, while the database that records the purchase history may contains customer ID, items bought, transaction time, prices, number of each items and so on, but for this specific task we only need items bought. After selection of relevant data, the database that we are going to apply our data mining techniques to will be much smaller, consequently the whole process will be

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Association Rule Mining in E- Commerce: a Survey

Association Rule mining is one of the most popular data mining techniques which can be defined as extracting the interesting correlation and relation among large volume of transactions. E-commerce applications generate huge amount of operational and behavioral data. Applying association rule mining in e-commerce application can unearth the hidden knowledge from these data. In this paper a surve...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

Numeric Multi-Objective Rule Mining Using Simulated Annealing Algorithm

Abstract as a single objective one. Measures like support, confidence and other interestingness criteria which are used for evaluating a rule, can be thought of as different objectives of association rule mining problem. Support count is the number of records, which satisfies all the conditions that exist in the rule. This objective represents the accuracy of the rules extracted from the da...

متن کامل

A Survey on Spatial Association Rule Mining Technique and Algorithms for mining spatial data

spatial association rule mining is an important technique of spatial data mining. Mining spatial association rule is one of the most important branches in the field of spatial data, spatial data mining can extract the spatial patterns and characteristics, general relations of spatial and non spatial data and other data features in common that hidden in spatial database. This paper describes and...

متن کامل

Survey: Privacy Preservation Association Rule Mining

Privacy is an important issue when one wants to make use of data that involves individuals’ sensitive information. Research on protecting the privacy of individuals and the confidentiality of data has received contributions from many fields, including computer science, statistics, economics, and social science. In this paper, we survey research work in privacy-preserving Data Publishing and Ass...

متن کامل

An Extensive Survey on Association Rule Mining Algorithms

Association rule mining (ARM) aims at extraction, hidden relation, and interesting associations between the existing items in a transactional database. The purpose of this study is to highlight fundamental of association rule mining, association rule mining approaches to mining association rule, various algorithm and comparison between algorithms. Keywords—Assocation Rule Mining; Bottom up Appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003